Date’s problem

Author

Santiago Sotelo

Published

January 25, 2023

Problem

  • The date-time variables appear in different formats in the data downloaded from BecharaReport API.
  • Some values of date-time variables does not reflect the date-time of the user considering their timezone.

Objectives

  1. Describe the problem.

  2. Explain the origin of the problem.

For the purpose of this report, we are going to use the data from BecharaReport API gathered in 2023-01-25. This data has 799362 observations and 80 variables, and can be found in the following Dropbox path:

Mobio Interactive Dropbox/Research/RWD/Shared R Scripts/Input csv/BecharaReportFull-2023-01-25T17_00_16.csv.

Obj 1: Describing the problem

In the AmDTx data downloaded from BecharaReport API we have the following 10 date-time variables:

Definitions from BecharaReport+
Variable Definition
created Timestamp of account creation
session_start Unique timestamp of AmDTx session start
session_finish Unique timestamp of AmDTx session finish
snapshot_start_pre Unique timestamp of AmDTx snapshot start (pre session or standalone)
snapshot_finish_pre Unique timestamp of AmDTx snapshot finish (pre session or standalone)
snapshot_start_post Unique timestamp of AmDTx snapshot start (post session or standalone)
snapshot_finish_post Unique timestamp of AmDTx snapshot finish (post session or standalone)
intent_start Timestamp of user initiating Intent portion of My Moment feature
intent_finish Timestamp of user completing Intent portion of My Moment feature
redeem_date When a coupon was applied to a user’s account
badge List of badges that the user has been awared up to the snapshot/session initiation

This variables appear in the database in the following way:

Selection of time & datetime variables considering user ID and session ID

Date-time formats

There are different time formats for date-time information:

Date-time formats from UTC Time Now
Date-Time Format UTC Date Time Now
UTC 2023-01-25T13:47:24Z
ISO-8601 2023-01-25T13:47:24+0000
RFC 2822 Wed, 25 Jan 2023 13:47:24 +0000
RFC 850 Wednesday, 25-Jan-23 13:47:24 UTC
RFC 1036 Wed, 25 Jan 23 13:47:24 +0000
RFC 1123 Wed, 25 Jan 2023 13:47:24 +0000
RFC 822 Wed, 25 Jan 23 13:47:24 +0000
RFC 3339 2023-01-25T13:47:24+00:00
ATOM 2023-01-25T13:47:24+00:00
COOKIE Wednesday, 25-Jan-2023 13:47:24 UTC
RSS Wed, 25 Jan 2023 13:47:24 +0000
W3C 2023-01-25T13:47:24+00:00
Unix Epoch 1674654444
YYYY-DD-MM HH:MM:SS 2023-25-01 13:47:24
YYYY-DD-MM HH:MM:SS am/pm 2023-25-01 01:47:24 PM
DD-MM-YYYY HH:MM:SS 25-01-2023 13:47:24
MM-DD-YYYY HH:MM:SS 01-25-2023 13:47:24

In BecharaReport data we can see date-time variables formatted in UTC format and W3C format.

In W3C format, the +00:00 at the end of the string means how many hh:mm the time is offset from UTC (Universal Time Coordinated). To know how many hours offset a place is from UTC, we can follow this map time, although in practice there are some exceptions:

UTC offset world map

Let’s enumerate the following incongruencies in the date-time variables:

Format problem

  • W3C format: created, session_start, session_finish, snapshot_start_pre, snapshot_finish_pre, snapshot_start_post, snapshot_finish_post, redeem_date, badge
  • UTC format: intent_start, intent_finish
    • This variables are already defaulted UTC +00:00.

Atypic values instead of NA

  • Intent start & finish are variables present in “My Moment” sessions. In case of other types of sessions the backend is defaulting the missing value as 0001-01-01T00:00Z.

Number of distinct observations equal to 0001-01-01T00:00Z for:

intent_start
type intent_start n %
<snapshot> 0001-01-01T00:00Z 1371 6.17
ac2048 0001-01-01T00:00Z 137 100.00
JourneyActivity 0001-01-01T00:00Z 229 51.12
JourneyLesson 0001-01-01T00:00Z 20482 86.74
JourneyTimer 0001-01-01T00:00Z 433 86.43
Library 0001-01-01T00:00Z 7880 80.38
None 0001-01-01T00:00Z 4 66.67
Timer 0001-01-01T00:00Z 3846 80.88
intent_finish
type intent_finish n %
<snapshot> 0001-01-01T00:00Z 1371 6.17
ac2048 0001-01-01T00:00Z 137 100.00
JourneyActivity 0001-01-01T00:00Z 229 51.12
JourneyLesson 0001-01-01T00:00Z 20482 86.74
JourneyTimer 0001-01-01T00:00Z 433 86.43
Library 0001-01-01T00:00Z 7880 80.38
None 0001-01-01T00:00Z 4 66.67
Timer 0001-01-01T00:00Z 3846 80.88
Number of distinct observations with intent_finish equal to 0001-01-01T00:00Z
intent_finish n %
Other 58004 62.78
0001-01-01T00:00Z 34382 37.22

Time-zone problem

  • The date-time variables define the UTC offset +00:00 according to the location of the user. When some date-time variables are not able to retrieve the GPS location of the user, then it defaults UTC to +00:00.

  • This is a problem because if we are interested in the local time of the user (e.g if the user used the AmDTx app in the morning or at night) then defaulting to +00:00 would change that local time.

  • For example, if the user is in Lima, Peru (UTC -05:00 ) and uses the app at 8:49pm, it would be defaulted to 1:49am.

    Date-time conversion
    Peru time (EST) / UTC -05:00 UTC +00:00
    2023-01-10T20:49:33-05:00 2023-01-11 01:49:33 UTC

Almost all date-time variables are defaulted to UTC +00:00 with the exception of badge variable.

Badge situation

As its definition says, badge represents a list of badges the user gains by using the app. This variable records the user badges in the following format:

  • {badge1,badge2,badge3}
  • For example: {B001:2019-07-18T12:15:40-07:00,B004:2019-07-25T07:00:54-07:00,B003:2019-06-18T12:15:40-07:00}
  • Not necessarily in order.
Displaying distinct observations in badge variable

As we can see, badge variable is recording the UTC offset of the badge gain and this can provide a future solution for the date-time problem if we apply the same principle of date-time recording to the other date-time variables.

Obj 2: Origin of the problem

As in the duplicate’s problem, part of the problem could be in the extraction of the data by BecharaReport API. Although, for example the atypic values problem also appear in the SQLPad database.

Further explanation about the problem could be found in the way SQL’s timestamp & timestampz are being used to store the date-time in AmDTx database. More documentation about this variables could be found in Cockroach Labs, MySQL